Sequential Pattern Mining by Pattern-Growth: Principles and Extensions
نویسندگان
چکیده
Sequential pattern mining is an important data mining problem with broad applications. However, it is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of sequential pattern mining methods: (1) a candidate generation-and-test approach, represented by (i) GSP [30], a horizontal format-based sequential pattern mining method, and (ii) SPADE [36], a vertical format-based method; and (2) a sequential pattern growth method, represented by PrefixSpan [26] and its further extensions, such as CloSpan for mining closed sequential patterns [35]. In this study, we perform a systematic introduction and presentation of the pattern-growth methodology and study its principles and extensions. We first introduce two interesting pattern growth algorithms, FreeSpan [11] and PrefixSpan [26], for efficient sequential pattern mining. Then we introduce CloSpan for mining closed sequential patterns. Their relative performance in large sequence databases is presented and analyzed. The various kinds of extension of these methods for (1) mining constraint-based sequential patterns, (2) mining multi-level, multi-dimensional sequential patterns, (3) mining top-k closed sequential patterns, and (4) their applications in bio-sequence pattern analysis and clustering sequences are also discussed in the paper.
منابع مشابه
Comparative Study of Various Sequential Pattern Mining Algorithms
In Sequential pattern mining represents an important class of data mining problems with wide range of applications. It is one of the very challenging problems because it deals with the careful scanning of a combinatorially large number of possible subsequence patterns. Broadly sequential pattern ming algorithms can be classified into three types namely Apriori based approaches, Pattern growth a...
متن کاملSupporting Interactive Sequential Pattern Discovery in Databases
One of the most important data mining problems is discovery of sequential patterns. Sequential pattern mining consists in discovering all frequently occurring subsequences in a collection of data sequences. This paper discusses several issues concerning possible extensions to traditional database management systems required to support sequential pattern discovery: a sequential pattern query lan...
متن کاملCOBRA: Closed Sequential Pattern Mining Using Bi-phase Reduction Approach
In this work, we study the problem of closed sequential pattern mining. We propose a novel approach which extends a frequent sequence with closed itemsets instead of single items. The motivation is that closed sequential patterns are composed of only closed itemsets. Hence, unnecessary item extensions which generates non-closed sequential patterns can be avoided. Experimental evaluation shows t...
متن کاملA Review Paper on Sequential Pattern Mining Algorithms
Sequential pattern mining and sequential rules mining are important data mining task for wide application. Its use to find frequently occurring ordered events or sub sequence as pattern from sequence database. Sequence can be called as order list of event. If one item set is completely subset of another item set is called sub sequence. Sequential pattern mining is used in various domains such a...
متن کاملThe Impact of the Pattern-Growth Ordering on the Performances of Pattern Growth-Based Sequential Pattern Mining Algorithms
Sequential Pattern Mining is an efficient technique for discovering recurring structures or patterns from very large dataset widely addressed by the data mining community, with a very large field of applications, such as cross-marketing, DNA analysis, web log analysis, user behavior, sensor data, etc. The sequence pattern mining aims at extracting a set of attributes, shared across time among a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005